This document:
Reads in the CTD files
Does QAQC if necessary,
Finds casts taken at the same reservoir on the same day for comparisons,
Plot comparison casts for each date, variable, and reservoir
Make a regression plot for each variable
Run t-tests for each variable
Plot and run Bland Altman Test
Plot and run Passin Bablok Test
## Warning: Removed 1646 rows containing missing values (`geom_point()`).
## Warning: Removed 14345 rows containing missing values (`geom_point()`).
## Warning: Removed 14345 rows containing missing values (`geom_point()`).
## Warning: Removed 2064 rows containing missing values (`geom_point()`).
## Warning: Removed 2064 rows containing missing values (`geom_point()`).
## Warning: Removed 9716 rows containing missing values (`geom_point()`).
## Warning: Removed 9716 rows containing missing values (`geom_point()`).
## Warning: Removed 40976 rows containing missing values (`geom_point()`).
## Warning: Removed 40976 rows containing missing values (`geom_point()`).
## Warning: Removed 1646 rows containing missing values (`geom_point()`).
## Warning: Removed 40367 rows containing missing values (`geom_point()`).
## Warning: Removed 40442 rows containing missing values (`geom_point()`).
## Warning: Removed 40420 rows containing missing values (`geom_point()`).
## Warning: Removed 12908 rows containing missing values (`geom_point()`).
## Warning: Removed 12908 rows containing missing values (`geom_point()`).
## Warning: Removed 44 rows containing missing values (`geom_point()`).
## Warning: Removed 44 rows containing missing values (`geom_point()`).
## Warning: Removed 785 rows containing missing values (`geom_point()`).
## Warning: Removed 785 rows containing missing values (`geom_point()`).
## Warning: Removed 18652 rows containing missing values (`geom_point()`).
## Warning: Removed 18652 rows containing missing values (`geom_point()`).
## Warning: Removed 21043 rows containing missing values (`geom_point()`).
## Warning: Removed 21055 rows containing missing values (`geom_point()`).
## Warning: Removed 21057 rows containing missing values (`geom_point()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
## Warning: Removed 15 rows containing missing values (`geom_point()`).
## Warning: Removed 1520 rows containing missing values (`geom_point()`).
## Warning: Removed 1520 rows containing missing values (`geom_point()`).
## Warning: Removed 2615 rows containing missing values (`geom_point()`).
## Warning: Removed 2615 rows containing missing values (`geom_point()`).
## Warning: Removed 3010 rows containing missing values (`geom_point()`).
## Warning: Removed 3175 rows containing missing values (`geom_point()`).
## Warning: Removed 3022 rows containing missing values (`geom_point()`).
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `==...`.
## Caused by warning in `variable == c("Temp_C", "DO_mgL", "DOsat_percent", "Cond_uScm", "SpCond_uScm",
## "Chla_ugL", "Turbidity_NTU", "PAR_umolm2s")`:
## ! longer object length is not a multiple of shorter object length
## `summarise()` has grouped output by 'Reservoir', 'Site', 'Date', 'SN',
## 'variable'. You can override using the `.groups` argument.
## Warning: Removed 1330 rows containing missing values (`geom_point()`).
R Script to compare methods in the Analytical Lab Authors: Bobbie Niederlehner Last Edited: 06/07/2022
Steps 1) load the libraries and the data 2) remove lines with missing values for either analysis 3) OPTIONAL: restrict the concentration range being analyzed 4) Paired T-Test- Test differences = 0 with paired T (NOT a preferred way to look at these data but magnitude of difference is interesting) 5) Bland Altman - Visualize differences over concentration (difference over mean concentration) 6) Passing Bablok - Test coincidence across concentrations - regression of new vs reference method. Does it have a slope of 1 and an intercept of 0
BACKGROUND INFO: Some web resources explaining method comparison regressions https://www.r-bloggers.com/deming-and-passing-bablok-regression-in-r/ https://www.r-bloggers.com/2015/09/deming-and-passing-bablok-regression-in-r/ http://labrtorian.com/tag/passing-bablok/ https://www.r-bloggers.com/2016/08/a-shiny-app-for-passing-bablok-and-deming-regression/ https://bahar.shinyapps.io/method_compare/
Notes NOTE: the order of the variables in each analysis can matter FOR paired T-test it doesn’t matter. Just pay attention to sign for difference. Difference is calculated as 1st specified (x) minus 2nd specified (y) FOR Bland Altman it doesn’t matter, Default difference is 1st specified (x) minus 2nd specified (y) FOR Passing Bablok first listed is reference method (x, Method1, or current), second listed is is test method (y, Method2, or new) The intercept is often smean difference, but will have the reverse sign.
## [1] "Chla_ugL"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = 20.519, df = 808, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 1.500423 1.817859
## sample estimates:
## mean difference
## 1.659141
##
## [1] "Cond_uScm"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = 4.2479, df = 937, p-value = 2.374e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 1.566278 4.256278
## sample estimates:
## mean difference
## 2.911278
##
## [1] "DO_mgL"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = -8.2185, df = 511, p-value = 1.707e-15
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -1.5083164 -0.9263218
## sample estimates:
## mean difference
## -1.217319
##
## [1] "DOsat_percent"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = -8.0353, df = 511, p-value = 6.507e-15
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -14.316489 -8.691145
## sample estimates:
## mean difference
## -11.50382
##
## [1] "PAR_umolm2s"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = -3.0679, df = 1017, p-value = 0.002212
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -21.81894 -4.79576
## sample estimates:
## mean difference
## -13.30735
##
## [1] "SpCond_uScm"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = 4.1874, df = 937, p-value = 3.087e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 2.055574 5.681799
## sample estimates:
## mean difference
## 3.868686
##
## [1] "Temp_C"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = -2.3129, df = 942, p-value = 0.02094
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.06615132 -0.00542224
## sample estimates:
## mean difference
## -0.03578678
##
## [1] "Turbidity_NTU"
##
## Paired t-test
##
## data: a$obs_S7809 and a$obs_S8188
## t = -0.97952, df = 808, p-value = 0.3276
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.354717 0.786967
## sample estimates:
## mean difference
## -0.7838751
# Bland Altman Plots (difference over mean)
# Estimates an agreement interval, within which 95% of the differences fall.
# BUT criterion for agreement should be decided on apriori and is completely based on judgement - what can you tolerate?
for(b in 1:length(var)){
a <-round_CTD%>%
filter(variable==var[b])
print(var[b])
ba<-bland.altman.stats(a$obs_S7809,a$obs_S8188, conf.int = 0.95)
#}
print (c("mean difference", "lower limit", "upper limit","sample size"))
print (c(ba$mean.diffs, ba$lower.limit, ba$upper.limit, ba$based.on))
# IF you prefer GGPLOT2 specify that graph.sys
# geom_count = TRUE handles overlapping points by making the plot symbol larger
ba2 <- bland.altman.plot(a$obs_S7809,a$obs_S8188, graph.sys = "ggplot2", conf.int=.95, geom_count = F)
print (ba2 +
xlab("Average of Two Measurments") +
ylab ("Difference of Two Measurments") +
ggtitle(paste0(var[b]," Bland Altman plot")) +
theme_bw()+
theme(plot.title = element_text(hjust = 0.5)))
}
## [1] "Chla_ugL"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] 1.659141 -2.848578 6.166860 809.000000
## [1] "Cond_uScm"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] 2.911278 -38.229295 44.051852 938.000000
## [1] "DO_mgL"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] -1.217319 -7.786362 5.351724 512.000000
## [1] "DOsat_percent"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] -11.50382 -74.99774 51.99011 512.00000
## [1] "PAR_umolm2s"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] -13.30735 -284.56092 257.94622 1018.00000
## [1] "SpCond_uScm"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] 3.868686 -51.590424 59.327797 938.000000
## [1] "Temp_C"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] -0.03578678 -0.96704939 0.89547583 943.00000000
## [1] "Turbidity_NTU"
## [1] "mean difference" "lower limit" "upper limit" "sample size"
## [1] -0.7838751 -45.3971128 43.8293627 809.0000000
Method Comparison Regression using nonparametric Passing Bablok (MORE RESISTANT TO OUTLIERS) The first listed variable definitely ends up on the X axis and documentation says it is the “reference” method The second listed variable ends up on the y axis and documentation says it is the “test” method
## [1] "Chla_ugL"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 809
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept -0.1517648 NA -0.1972359 -0.1166131
## Slope 0.7429368 NA 0.7309313 0.7541139
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept -0.15176 -0.15177 0.00000 0.01958
## Slope 0.74294 0.74251 -0.00043 0.00595
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "Cond_uScm"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 938
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept 0.5023268 NA 0.3774935 0.6211481
## Slope 0.9858717 NA 0.9818603 0.9901045
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept 0.50233 0.50275 0.00042 0.06354
## Slope 0.98587 0.98586 -0.00001 0.00211
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "DO_mgL"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 512
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept -0.06030551 NA -0.118805 -0.01640155
## Slope 1.22419276 NA 1.192082 1.26880825
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept -0.06031 -0.06117 -0.00087 0.02617
## Slope 1.22419 1.22652 0.00232 0.02161
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "DOsat_percent"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 512
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept -0.4356803 NA -0.942243 0.04528191
## Slope 1.2128068 NA 1.169759 1.26219676
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept -0.43568 -0.42338 0.01230 0.23720
## Slope 1.21281 1.21401 0.00121 0.02332
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "PAR_umolm2s"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 1018
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept -0.005063738 NA -0.008550204 -0.002357759
## Slope 0.979424973 NA 0.957568075 1.007393998
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept -0.00506 -0.00518 -0.00012 0.00158
## Slope 0.97942 0.98037 0.00095 0.01276
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "SpCond_uScm"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 938
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept 0.3487772 NA 0.1664397 0.5810292
## Slope 0.9947974 NA 0.9888955 0.9994599
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept 0.34878 0.35898 0.01021 0.10765
## Slope 0.99480 0.99460 -0.00019 0.00278
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "Temp_C"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 943
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept -0.04097462 NA -0.06446333 -0.02195253
## Slope 1.00335383 NA 1.00170323 1.00537231
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept -0.04097 -0.04153 -0.00056 0.01119
## Slope 1.00335 1.00341 0.00006 0.00096
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"
## [1] "Turbidity_NTU"
## [1] "Passing Bablok"
##
##
## ------------------------------------------
##
## Reference method: obs_S7809
## Test method: obs_S8188
## Number of data points: 809
##
## ------------------------------------------
##
## The confidence intervals are calculated with
## bootstrap ( quantile ) method.
## Confidence level: 95%
##
##
## ------------------------------------------
##
## PASSING BABLOK REGRESSION FIT:
##
## EST SE LCI UCI
## Intercept -0.3725648 NA -0.5934450 -0.2461188
## Slope 0.9621016 NA 0.9449411 0.9802377
##
##
## ------------------------------------------
##
## BOOTSTRAP SUMMARY
##
## global.est bootstrap.mean bias bootstrap.se
## Intercept -0.37256 -0.38288 -0.01032 0.08641
## Slope 0.96210 0.96204 -0.00006 0.00894
##
## Bootstrap results generated with environment RNG settings.
## [1] "pearson correlation coefficient"